PANcakes Team: A Composite System of Domain-Agnostic Features For Author Profiling

نویسندگان

  • Pepa Gencheva
  • Martin Boyanov
  • Elena Deneva
  • Preslav Nakov
  • Georgi Georgiev
  • Yasen Kiprov
  • Ivan Koychev
چکیده

We present the system we built for participating in the PAN-2016 Author Profiling Task [9]. The task asked to predict the gender and the age group of a person given several samples of his/her writing, and it was offered for three different languages: English, Spanish, and Dutch. We participated in both subtasks, for all three languages. Our approach focused on extracting genre-agnostic features such as bag-of-words, sentiment and topic derivation, and stylistic features. We then used these features to train SVM-based classifiers, as implemented in LIBLINEAR for the gender classification sub-task, and in LIBSVM for the age classification sub-task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

XRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015

This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification...

متن کامل

Computing with living hardware

Our multi-institutional team of eleven undergraduates, one high school student, one postdoctoral fellow, and four faculty members explored the emerging field of synthetic biology and presented our results at the 2006 international Genetically Engineered Machine (iGEM) competition. Having had little or no previous research experience, biology, chemistry and mathematics students from four differe...

متن کامل

Automatic Generation of a Multi Agent System for Crisis Management by a Model Driven Approach

Considering the increasing occurrences of unexpected events and the need for pre-crisis planning in order to reduce risks and losses, modeling instant response environments is needed more than ever. Modeling may lead to more careful planning for crisis-response operations, such as team formation, task assignment, and doing the task by teams. A common challenge in this way is that the model shou...

متن کامل

SeerNet@INLI-FIRE-2017: Hierarchical Ensemble for Indian Native Language Identification

Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016